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Overview 

• Patterns of human genetic variation 
-Among populations 

-Among individuals 

- How evolutionary factors influence variation 

• “Race” and its biomedical implications 

• Linkage disequilibrium, evolution, and 
disease-gene identification 


The “four major factors of evolution” 

• Mutation: the author of variation 

• Natural selection: the editor 

• Genetic drift: the randomizer 

• Gene flow: the homogenizer 


Sewall Wright, 1956, Cold Spring Harbor Symp. Quant. Biol. 20: 16-24 
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Mutation and Genetic Variation 

Human mutation rate is 1.0 - 1.5 x 10 -8 per bp 
per generation: we transmit ~30 new DNA 
variants with each gamete 

(J. Roach etal., 2010, Science; D. Conrad et al ., 2011, Nature Genetics) 

“The capacity to blunder slightly is the real marvel 
of DNA. Without this special attribute, we would 
still be anaerobic bacteria and there would be no 
music. ” 

- Lewis Thomas 


Single-gene mutations increase with paternal age: at 
least 75% of new mutations occur in male germline 


An additional two 
mutations occur with 
each year of paternal 
age (baseline: ~30 
mutations in a male aged 
30) 

(Kong et al., 23 Aug. 
2012, Nature) 



Age 
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How much do we differ? 

(number of aligned DNA base differences) 


• Identical twins 

• Unrelated humans 

• Human vs. chimp 

• Human vs. mouse 



1/6-1/3 


• 3 billion DNA bases -> 3 million differences (single 
nucleotide variants [SNVs]) between each pair of 
haploid human DNA sequences 


Relative diversity in great apes 



Average number of SNVs per individual 

Orangutans Gorillas Chimpanzees Humans 

9.3 million ^ 6.5 million ^ 5.7 million ^ 3-4 million 


As a species, humans have relatively low diversity 


(Prado-Martinez et al., 2013, Nature) 
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Copy number variants (deletions/duplications > 50 bp) 
account for more inter-individual variation than do single¬ 
nucleotide variants 


The conventional view is that we have two copies of all genes except those on the sex chromosomes... 



• • • • • • 

Gene copy number • • • • • 

• • 


In an average haploid human sequence, -9 Mb are affected by structural variants; 3.6 Mb are affected by 
SNVs; on average, humans are heterozygous for -150 CNVs (Sudmant et al., 2015, Nature) 




How much do human populations differ? 


Bolivian (23) 


Tongan(13) 


Bambara (25) 


• Dogon (24) 


Buryat (25) 

Kyrgyzstan (25) • 


O 

Iraqi Kurds (25) 

® Pakistanis (25) ^ 

• • o 

Nepalese (25) 

O 
O 


Thai 


CP°0 
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Allele frequencies in populations 


Population 

SNV 1 

SNV 2 

SNV 3 

1 

0.588 

0.890 

0.880 

2 

0.671 

0.559 

0.528 

3 

0.792 

0.790 

0.828 


Average heterozygosity: for each locus, 
obtain the proportion of heterozygous 
individuals by direct counting; average 
across loci 


1/1000 bp varies between a pair of 
individuals: how is this variation 
distributed between continents? 


h t 

F st is the amount of genetic variation that is due to population differences 

H t is the total heterozygosity (variation) in the sample 

H s is the average heterozygosity within each population (continent) 

F st = 0: All variation exists within populations; none exists between 
F st = 1: All variation exists between populations 
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How is genetic variation distributed 
among continental populations? 



60 

STRs 

100 

Alus 

75 

Lis 

250K 

SNP 


Between 

individuals, 

within 

continents 

90 % 

86 % 

88 % 

88 % 


Between 

continents 

(F S t) 

10 % 

14 % 

12% 

12% 



F st : proportion of variation attributed 
to population subdivision 


Jorde et al., 2000, Am. J. Hum. Genet. 
J. Xing et al., 2009, Genome Res 


How is genetic variation distributed 
among continental populations? 



60 

STRs 

100 

Alus 

75 

Lis 

250K 

SNP 

Skin 

pigment¬ 

ation 

Between 

individuals, 

within 

continents 

90 % 

86 % 

88 % 

88 % 

10 % 

Between 

continents 

(F S t) 

10 % 

14 % 

12 % 

12 % 

90 % 


Jorde et al., 2000, Am. J. Hum. Genet. 
J. Xing et al., 2009, Genome Res. 
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% common SNPs shared among four major 
regions (Africa, Europe, E. Asia, India): 
250K chip results for ~1 ,000 samples 


Minor allele present in: 


All 4 groups 

78.6% 

At least 3 groups 

88.0% 

At least 2 groups 

92.1% 

Africa only 

7.4% 

Any non-African group 

0.5% 


No SNPs were fixed present in one population, fixed 

absent in another J. Xing etal 2010, Genomics 


Rare single nucleotide variants (SNVs) are 
much more likely to be population-specific 


Common SNPs 


New rarer SNVs 
identified by 
sequencing 


404,749 

25 475.28Z^^n^ CHB+JPT 
^ 142,500 


623,569 




Average allele frequency 
difference between 
populations: 15% 


<5% of alleles with 
frequency < 2% are 
shared across 
continents 


Durbin etal., 2010, Nature 
Auton etal., 2015, Nature 
(1000 Genomes Project) 
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Rare copy number variants are population- 
specific (1000 Genomes data) 


1.00 


S 0.75 


CO 

> 

CO 


c 

o 

o 

Q. 

O 


0.50 


0.25 


0.00 



AFR=African 
AMR=Native 
American 
EAS=East Asian 
EUR=European 
SAS=South Asian 
Shared=present in >1 
superpopulation 
All=present in all 
superpopulations 


10 100 1,000 
Variant allele count 


Sudmant et al.. 2015. Nature 


A simple genetic distance to 
measure population differences 

D ij = Ir - Pjl 


Djj is the genetic distance between populations i 
and j; p, and Pj are the allele frequencies of a 
SNV in populations i and j. 


Pop. 

SNV 1 

SNV 2 

SNV 3 

1 

0.588 

0.890 

0.880 

2 

0.671 

0.559 

0.528 

3 

0.792 

0.790 

0.828 

|0.588 

- 0.6711 = 0.083i 
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Building a population network 


Pop. SNV1 

1 0.588 

2 0.671 

3 0.792 


IPi - P 2 I I Ps - (Pi + P 2 )/2 I 



Percent agreement between Supreme Court justices 
(New York Times, 2014) - analogous to % alleles 
shared among individuals 


Ruth Bader Ginsburg 
Sonia Sotomayor 
Elena Kagan 
Stephen Breyer 
Anthony Kennedy 
John Roberts 
Antonin Scalia 
Samuel Alito 
Clarence Thomas 
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Principal components analysis (PCA): 
a multidimensional regression technique 


0.8 

0.7 

0.6 

0.5 

0.4 • 

0.3 • 

0.2 • 

: 

0.1 • 

0 

0 0.1 0.2 
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0.4 
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Principal components analysis (PCA): 
a multidimensional regression technique 

0.8 

0.7 

0.6 

0.5 
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0.1 

0 
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Genetic similarity between two people can 
be completely described with a line 



Genetic similarities among three people can 
be completely described with a plane (two 

dimensions) 
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Principal components analysis of Supreme 
Court decision-making agreement 



• Breyer 

10 - 

• Alito 


• Kennedy 


• Roberts 

CM „ 


O ° " 

• Thomas 

Q_ 

• Sotomayer 


• Ginsburg • Kagan 

ID _ 


O _ 

• Scalia 


— i-1-1-1-1-1-r 

-15 -10 -5 0 5 10 15 


PCI 

Thanks to: Julie Feusier, PhD-to-be 


Population relationships based on 
100 autosomal Alu polymorphisms 


Ancestral 


Biaka Pygmy 


Mbuti Pygmy 2 


Alur 
Nande 


Bootstrap support levels 



Watkins et al., 2003, Genome Res. 13: 1607-18 


Upper castes 
Middle castes 
Lower castes 
Tribals 

Poles —J (95 

N ' Eur ° pean Frenct/ A nt Vietnamese 
Chinese' 1 Cambodian 
Malay Japanese 
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^Ancestral 
— Pygmy 
— IKung 
j— Pedi 

ti- Nguni 

1 - Sotho/Tswana 

_ j— Dogon 
li- Bambaran 
l YRI 
Alur 
pLuhya 
Hema 


Africa 
America 
Central Asia 
East Asia 
Europe 
Polynesia 
West Asia 


i- N. European 
f- CEU 
J- Slovenian 
I— Tuscan 

- Urkarah 
- Stalskoe 

- Iraqi Kurds 
Pakistani 

- T.N. Brahmin 
A.P. Brahmin 

- A.P. Mala 
-T.N. Dalit 

— A.P. Madiga 

— Irula 

- Nepalese 
_P Bolivian 

I— Totonac 
Kyrgyzstan 
p Buryat 
r JPT 

1 — Japanese 
CHB 

li— Chinese 

Vietnamese 
- Cambodian 


Tongan 

Samoan 



40 Populations, 
-250K SNVs 


Xing et al., 2010, Genomics 


Similar patterns seen in Human Genome 
Diversity Project data 



525,910 SNPs 


396 copy number variants (CNVs) 


Jakobsson et al., 2008, Nature 451: 998-1003 
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Haplotype diversity declines 
with distance from Africa 



J. Xing et al., 2010, Genomics 


Principal components analysis (PCA) displays individual genetic 
similarity in 2D: each dot = 1 individual 



• Africa 

• America 

• Central Asia 

• East Asia 

• Europe 

• Polynesia 

• West Asia 


850 individuals 


East Asia 

■J 1 

• •*•4* 

Polynesia** *7 

»,*• 

America %• 

;4 

*«• 




Central Asia 




£ 

/ 



West Asia^* 
Europe 


3 


Xing et al., 2010, Genomics 
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PCA: Eurasian Populations 


-fik CEU 

25 populations, 554 individuals 



«■£ Utah N. European 
'tT Slovenian 
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Buryats 
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••x 

T. N. Brahmin • 
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A. P. Brahmin *. 
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• A.P. Mala 



• CHB 

• Buryats 


# # • 

• Chinese 

• Nepalese 


. A.P Madiga 

• Iban 

• Pakistanis 


t » A.P. Mala 

• T.N. Brahmin 

• Slovenian 


\ T.N. Dalit 

• T.N. Dalit 

• Iraqi Kurds 

• Stalskoe 

• Thai 



• Irula 

• Tuscan 


. .. • 

• JPT 

• Urkarah 


Irula 

• Japanese 

• N. European 


• 

• Cambodian 

• Vietnamese 


* 

• Kyrgyzstanis 



Xing et al., 2010, Genomics 


Serial founder effect: genetic drift 
increases with distance from Africa 


Longitude (deg) 



-50 


0 50 100 150 200 250 300 

Longitude (deg) 


Colon na et al., 2011, Genome Biol. 
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Recent African origin of anatomically modern humans 


15-35 kya 


Finnish 

o 







v 


European 


Admixture in the 
Americas 


Y 


Initial human expansion across globe 
Prehistorical migrations and admixture 
Historical migrations and admixture 


Gradients in SNP haplotype diversity O Sample locations of populations noted in text 
Inferred bottleneck events Location of sampled aDNA 


Novembre J, Ramachandran S. 2011. 

Annu. Rev. Genomics Hum. Genet. 12:245-74 


PCA can distinguish closely related populations: 
1 million SNP microarray 



• 


M -- 5 

•a 


• 



• 

• • 

*• IN .• •• . 

• 



• 

• 



* # . * 

• 



» 

• 

• Buryat Mongolian 

• CHB 

t 


• JPT 

• 


• Kyrgystan 

- 


• Qinghai Mongolian 

* 


• Maduo Tibetan 2009 



• TuoTuo River Tibetan 2010 

• • 


i_i_i 


PC 1 (8.35%) 


Xing et al., 2013 PLoS Genetics 
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Principal components analysis of 3,000 Europeans 

(500,000 SNPs) 


J Novembre et at. 2008 Nature 


Genetic distance analysis: 15 loci 


• Iceland 


^Norway 


Netherlands 
►Mormon. 


• England 

U.S. 


Sweden 
Denmark 


• Germany 


•Switzerland 

•France 


•Spain 


Finland* 


Poland* 


Maly 


McLellan, Jorde, and Skolnick, 1984, Am. J. Hum. Genet. 
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Sequence data permit more accurate 
inferences about population history 

• Microarray SNPs are selected for higher 
frequency and diversity in Europeans 

• Complete DNA sequences are unbiased 
and include information about rare variants 

• Coalescence methods can be used 
effectively with sequence data 


The effect of ascertainment bias on allele frequencies: 
Microarray data cannot accurately estimate demographic 
parameters (population size, growth rates) 



15: 1496-1502 


April 6,2016 
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Allele frequency spectrum (2,440 exomes) 
indicates a recent population expansion 



MAF (%) Bin 

73% of all protein-coding SNVs and 86% of 
deleterious SNVs arose within past 5,000-10,000 

years (Fu et al., 2013, Nature, 493: 216-20j Tennessen et al., 2012, Science 


Population expansions increase 

the frequency of rare variants 

On 

-| | Extinction probability of a new 

^ variant-(1/2) 2 -1/4 

<T 

A 
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Population expansions increase 

the frequency of rare variants 


On 

-| | Extinction probability of a new 

^ variant - (1/2) 10 - 1/1024 


iiiiA : 



The 1000 Genomes Project 

A global reference for human 
genetic variation 

The 1000 Genomes Project Consortium* 


The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by 
applying whole-genome sequencing to a diverse set of individuals fr om multiple populati ons. Here we report 
completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combina¬ 
tion of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We 
characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide 
polymorphisms (SNPs), 3.6million short insertions/deletions (indels), and 60,000 structural variants), all phased 
onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of 
ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for 
common disease studies. 



Auton et al., 2015, Nature 
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The spectrum of human genetic 

variation 


Table 1 | Median autosomal variant sites per genome 



AFR 

AMR 

EAS 

EUR 

SAS 

Samples 

Mean coverage 

661 

8.2 

347 

7.6 

504 

7.7 

503 

7.4 

489 

8.0 


Var. sites 

Singletons 

Var. sites 

Singletons 

Var. sites 

Singletons 

Var. sites 

Singletons 

Var. sites 

Singletons 

SNPs 

4.31M 

14.5k 

3.64M 

12.0k 

3.55M 

14.8k 

3.53M 

11.4k 

3.60M 

14.4k 

Indels 

625k 


557k 


546k 


546k 


556k 


Large deletions 

1.1k 

5 

949 

5 

940 

7 

939 

5 

947 

5 

CNVs 

170 

1 

153 

1 

158 

1 

157 

1 

165 

1 

MEI (Alu) 

1.03k 

0 

845 

0 

899 

1 

919 

0 

889 

0 

MEI (LI) 

138 

0 

118 

0 

130 

0 

123 

0 

123 

0 

MEI (SVA) 

52 

0 

44 

0 

56 

0 

53 

0 

44 

0 

MEI (MT) 

5 

0 

5 

0 

4 

0 

4 

0 

4 

0 

Inversions 

12 

0 

9 

0 

10 

0 

9 

0 

11 

0 

Nonsynon 

12.2k 

139 

10.4k 

121 

10.2k 

144 

10.2k 

116 

10.3 k 

144 

Synon 

13.8k 

78 

11.4k 

67 

11.2k 

79 

11.2k 

59 

11.4k 

78 

Intron 

2.06M 

7.33 k 

1.72M 

6.12k 

1.68M 

7.39k 

1.68M 

5.68k 

1.72M 

7.20k 

UTR 

37.2k 

168 

30.8k 

136 

30.0k 

169 

30.0k 

129 

30.7k 

168 

Promoter 

102k 

430 

84.3 k 

332 

81.6k 

425 

82.2k 

336 

84.0 k 

430 

Insulator 

70.9k 

248 

59.0k 

199 

57.7k 

252 

57.7k 

189 

59.1k 

243 

Enhancer 

354k 

1.32k 

295k 

1.05k 

289k 

1.34k 

288k 

1.02 k 

295k 

1.31k 

TFBSs 

927 

4 

759 

3 

748 

4 

749 

3 

765 

3 

Filtered LoF 

182 

4 

152 

3 

153 

4 

149 

3 

151 

3 

HGMD-DM 

20 

0 

18 

0 

16 

1 

18 

2 

16 

0 

GWAS 

2.00k 

0 

2.07k 

0 

1.99k 

0 

2.08k 

0 

2.06k 

0 

ClinVar 

28 

0 

30 

1 

24 

0 

29 

1 

27 

1 


See Supplementary Table 1 for continental population groupings. CNVs, copy-number variants; HGMD-DM, Human Gene Mutation Database disease mutations; k, thousand; LoF, loss-of-function; M, million; MEI, 
mobile element insertions. 

Auton et al., 2015, Nature 


Variation in individuals: 1000 Genomes Project 



Auton et al., 2015, Nature 
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A “typical” human genome 




Protein truncating 

149-182 

Peptide altering 

10,000-12,000 

Regulatory 

(UTR, TBS, promoter, etc.) 

459,000 - 565,000 

Associated with complex trait 

-2,000 

ClinVar disease causing 

24-30 


Simons Genome Diversity Project (SGDP): 300 
individuals in 142 populations; 40x sequencing 



Sudmant et al., 2015, Science 
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Copy number variation in SGDP samples 



duplications 



heterozygous CNVs (dels) 


heterozygous CNVs (dups) 



heterozygous SNVs 



Sudmant et al., 2015, Science 
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A sequence-based demographic model, with line width corresponding to 
population size and time flowing from left to right 



Recent expansion explains excess of rare 
alleles 


Gravel S et al. 2011 PNAS 108:11983-11988 


Neandertal admixture with anatomically modern humans: 
On average, non-Africans have 1-4% Neandertal DNA 



B Vernot and J M Akey, 2014 Science 
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Sequence-based reconstruction of 
Ashkenazi Jewish demographic history 


Time 

(years ago) 


13,900 


3,900 


W 


21k 


Present 


AJ 


Present—— 


49 % 


3,700 


170,000 


23,800 


Carmi et al., 2014, Nat. Comm. 


Drift has increased the frequencies of 
several disease-causing mutations 

• Three founder mutations in BRCA1 or BRCA2 
are seen in 2.5% of Ashkenazi Jews (1/200 in 
general population) 

• APC mutation predisposing to colorectal cancer 
is seen in 6% of Ashkenazi population 

• Several lysosomal storage disorders (Gaucher, 
Niemann-Pick, Tay-Sachs) are relatively 
common 
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What can genetics tell us about “race”? 


“’Race’ is biologically meaningless” 

- Schwartz, 2001, N. Engl. J. Med. 


“I am a racially profiling doctor” 

-- Satel, May 5, 2002, New York Times 


Bamshad and Olson, 
2003 


SCIENCE AND SOCIETY 

Taking race out of human genetics 

Engaging a century-long debate about the role of race in science 

-- Yu del I et a/., 2016, Science 


SCIENTIFIC jll 
AMERICAN 



Genetic Results 
May Surprise 
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It may be doubted whether any character 
can be named which is distinctive of a race 
and is constant.” 

-- Charles Darwin, 1871, The Descent of Man, and 
Selection in Relation to Sex 
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Height 


55 60 65 70 75 80 


Height + 

waist/hip 

ratio 




PCA of genetic distances among 467 
individuals: 10 SNPs 


£ 


\ 


: 



Second PC/ 


First PC/ 


Alur 

Brahmin 

CEU 

CHB 

Hema 

Iban 

Irula 

JPT 

Japanese 

Luhya 

Madiga 

Mala 

N. European 

Pygmy 

Stalskoe 

Tuscan 

Urkarah 

YRI 
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PCA of genetic distances among 467 
individuals: 100 SNPs 



-3 Second PC- 


First PC 


# Alur 

# Brahmin 

# CEU 

# CHB 

# Hema 

# Iban 

# Iruia 

# JPT 
Japanese 
Luhya 

# Madiga 

# Mala 

# N. European 

# Pygmy 

# Stalskoe 

# Tuscan 

# Urkarah 

# YRI 


PCA of genetic distances among 467 
individuals: 1000 SNPs 


Europeans 


E.Asians 


Africans 



Second PC. 


First PC 


# Alur 

# Brahmin 

# CEU 

# CHB 

# Hema 

# Iban 

# Iruia 

# JPT 
Japanese 
Luhya 

# Madiga 

# Mala 

# N. European 

# Pygmy 

# Stalskoe 

# Tuscan 

# Urkarah 

# YRI 
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PCA of genetic distances among 467 
individuals: 10,000 SNPs 



Alur 

Brahmin 

CEU 

CHB 

Hema 

Iban 

Irula 

JPT 

Japanese 

Luhya 

Madiga 

Mala 

N. European 

Pygmy 

Stalskoe 

Tuscan 

Urkarah 

YRI 
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Population affiliation cannot accurately 
predict individual genotypes or traits 



SNP 

SNP 


SNP 

SNP 


SNP 

SNP 


SNP 

SNP 


SNP 

SNP 


SNP 

SNP 


SNP 
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SNP 
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SNP /- 
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\ \ SNP 

\ \ SNP 
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SNP / \ / 

SNP | \l 

\ \ SNP 
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/ / SNP 


SNP \ \ - 

/ / SNP 
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/ / SNP 
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PCA of 1000 Genomes data, 
including African-Americans 


• African Americans (ASW) 

• CEPH(CEU) 

• Chinese (CHB) 

• Finns (FIN) 

Japanese (JPT) 

• Kenyans (LWK) 

• Nigerians (ESN) 

• Nigerians (YRI) 

• Tuscans (TSI) 
Vietnamese (KFIV) 


- 0.04 - 0.02 0.00 0.02 

PCI (7.92%) 
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Ancestry vs. Race 


Europe 

an 



African 


Native 

American 



“Afri ca n - Am e r i ca n ” “ Af r i ca n - Am e r i ca n ” 
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A My Home 
Inbox (3) 

Health 

Clinical Reports 
Research Reports 
Health Labs 

Ancestry 

Maternal Line 
i Paternal Line 
Relative Finder 
Ancestry Painting 
Global Similarity 
Ancestry Labs 

Sharing & Community 

Compare Genes 
Family Inheritance 
23andMe Community 

23andWe 

My Surveys (31) 
Research Initiatives 


paternal line 


Your Y chromosome DNA determines your paternal haplogroup. What is a haplogroup? 

Map History 11 Haplogroup Tree | 


tell a Mend 


Paternal Haplogroup: II* 

II* is a subgroup of II, which is described below. 

Locations of haplogroup II circa 500 years ago, before the era of intercontinental travel. 

Asia North 

America Atjgn(jc 

Pacific 0063/1 

Africa ocean 

Indian South 

Ocean Amenta 

Australia 


0 % 50 % 100 % 

Haplogroup II can be found at levels of 10% and higher in many parts of Europe, due to its 
expansion with men who migrated northward after the end of the Ice Age about 12,000 
years ago. It reaches its highest levels in Denmark and the southern parts of Sweden and 
Noway. 

Human Prehistory Videos 



Human Prehistory: Prologue 


Out of (Eastern) Africa 


Haplogroup: II, a subgroup of 
Age: 28,000 years 
Region: Northern Europe 
Populations: Finns, Nowegians, 

Swedes 

Highlight: Haplogroup II reaches highest 
frequencies in Scandinavia. 


Your Family and Friends 

D2a1b Japanese Person 

E1b1a8a... Nigerian Person 
hr Lynn Jorde 

N Chinese Person 

Famous People 

C3 Genghis Khan 

H Jimmy Buffett, Warren Buffett 

(la Alexander Hamilton 

Rib John Adams 

I Thomas Jefferson 

Tell Me About... 


A My Home 
Inbox (3) 

Health 


maternal line 


Your mitochondrial DNA determines your maternal haplogroup. What is a haplogroup? 


Clinical Reports 
Research Reports 
Health Labs 

Ancestry 

► Maternal Line 
Paternal Line 
Relative Finder 
Ancestry Painting 
Global Similarity 
Ancestry Labs 

Sharing & Community 

Compare Genes 
Family Inheritance 
23andMe Community 

23andWe 

My Surveys (31) 
Research Initiatives 


Map History Haplogroup Tre e 


Maternal Haplogroup: U8a 

U8a is a subgroup of U8, which is described below. 

Locations of haplogroup U8 circa 500 years ago, before the era of intercontinental travel. 



Haplogroup: U8, a subgroup of 
Age: 50,000 years 
Region: Europe, Near East, northern 
Africa 

Populations: Basques, Finns 
Highlight Haplogroup U8 entered 
Europe with the first modem humans to 
inhabit the continent.; Early Europe 

Your Family and Friends 

D4e2 Japanese Person 

D5a* Chinese Person 


Haplogroup U8 arose in the Near East about 50,000 years ago and moved into Europe not 
long afterward, along with the first modem humans to inhabit the continent. Limited to a few 
scattered locations during the Ice Age, another migration carried the haplogroup out of the 
Iberian Peninsula into central and northern Europe after climate conditions began 
improving about 15,000 years ago. 


L3e Nigerian Person 

U8a Lynn Jorde 


Famous People 


Human Prehistory Videos 



Human Prehistory: Prologue 


Out of (Eastern) Africa 


Recent Posts in Maternal Line 


H Marie Antoinette 

H£ Jimmy Buffett 

H4a Warren Buffett 

T2 Jesse James 

V Benjamin Franklin, Bono 

Tell Me About... 

...mitochondrial DNA(mtDNA). 
_ 
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A My Home 
Inbox(3) 


Health 

Clinical Reports 


ancestry painting 

Trace the ancestry of your chromosomes, one segment at a time. Last updated April 23,2008. 

Chromosome View 


Health Labs 

Ancestry 

Maternal Line 
Paternal Line 
Relative Finder 
► Ancestry Painting 
Global Similarity 
Ancestry Labs 

Sharing & Community 

Compare Genes 
Family Inheritance 
23andMe Community 

23andWe 

My Surveys (31) 
Research Initiatives 


\ Solid segments indicate that both chromosomes come from the t 


! geographic region. See a Cambodian Woman's painting. 


Dual-colored segments indicate chromosomes from different geographic regions. See an African American Man's painting. 


Select a person: iLyrniJcwde 




Worldwide Examples 

Click on the icons in the map below to see example 
paintings of individuals from across the globe. 



Tell Me About... 

...using Ancestry Painting. 

...the three reference populations. 

...why only three populations are used. 

...the people linked to my account 

...why it says I'm European/African/Asian when I'm really 

an American/Australian/South African. 

...howthe percentages are calculated. 

...where the X and Y chromosomes are. 


HOME MY RESULTS FAMILY & FRIENDS RESEARCH 8c COMMUNITY 


ANCESTRY COMPOSITION Lynn Jorde 0 Speculative 0 


Lynn ▼ B 49 Q 
SHARE TIPS HELP 


Map View 


IB 


Sub-regional Resolution 



Ancestry Composition tells you wtiat percent of your DNA comes 
from each of 31 populations worldwide The analysis includes DNA 
you received from all of your ancestors, on both sides of your 
family. The results reflect where your ancestors lived 500 years 
ago, before ocean-crossing ships and airplanes came on the 


■ 

100% 

European 



■ Northern European 


88.1% 

■ Scandinavian O 


5.9% 

■ British & Irish 


0.5% 

■ French & German 


0.0% 

■ Finnish 


5.4% 

Nonspecific Northern European 



■ Southern European 


0.0% 

■ Sardinian 


0.0% 

■ Italian 


0.0% 

■ Iberian 


0.0% 

■ Balkan 


0.0% 

Nonspecific Southern European 


0.0% 

■ Eastern European 


0.0% 

■ Ashkenazi 


0.0% 

Nonspecific European 

■ 

0.0% 

Middle Eastern & North African 


0.0% 

■ Middle Eastern 


0.0% 

■ North African 


0.0% 

Nonspecific Middle Eastern & Nort.. 

■ 

0.0% 

Sub-Saharan African 


0.0% 

■ West African 


0.0% 

■ East African 


nn% 

U Ontral ft South African_ 
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Maternal Line 
Paternal Line 
Relative Finder 
► Ancestry Painting 
Global Similarity 
Ancestry Labs 

Sharing & Community 

Compare Genes 
Family Inheritance 
23andMe Community 

23andWe 

My Surveys (31) 
Research Initiatives 


Select a person: [African American Man 


IMm 

3 


> 4 '. 

5 M 


JLU 


7 

8 




9 

10 


MU 


11 

12 


13 


15 


m i !■ i 

m ■ > 


16 

17 


18 

19 



African American Man (?) 

Most African Americans today trace a large part 
their ancestry to sub-Saharan Africa as a result of 
the slave trade. Over the generations since, both 
Europeans and Native Americans have 
intermarried with African Americans and 
contributed ancestry, as seen in the ancestry 
painting of this man, self-identified as African 
American. In fact, one of this man's chromosomes 
appears to be fully European across the whole 
genome, so it is likely that one of his parents was 
European. 

| Europe 64% 

Africa 33% 

| Asia 4% 

| Not Genotyped 


Worldwide Examples 

Click on the icons in the map below to see example 
paintings of individuals from across the globe. 



What do these findings imply for 
biomedicine? 

• Large numbers of independent DNA 
polymorphisms can inform us about 
ancestry and population history 

• These variants typically differ between 
populations only in their frequency and 
imply substantial overlap between 
populations 
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Blood pressure response to ACE inhibitors 

(Sehgal, 2004, Hypertension 43: 566-72) 

Mean Racial 
Difference 



EGFR inhibitors and non-small cell 
lung cancer 

• Gefitinib and erlotinib inhibit epidermal 
growth factor receptor (EGFR) tyrosine 
kinase activity 

• Effective in 10% of Europeans, 30% of 
Asians (Japanese, Chinese, Koreans) 

• Somatic mutations in EGFR found in 10% 
of Europeans, 30% of Japanese 

• 70-80% of those with mutations respond to 
gefitinib; <10% of those without mutations 

reSDOnd Johnson, 2005, Cancer Res. 65: 7525-9; McDermotj 

^ etal., 2011, N. Engl. J. Med. 364: 340-50 
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Genetic Variation and “Race” 

• Genetic variation is correlated with 
geography and tends to be distributed 
continuously across geographic space 

• “Race” may not be biologically meaningless, 
but it is biologically imprecise 

• Individual ancestry provides more medically 
useful information 


Linkage disequilibrium and disease-gene mapping: 
nonrandom association of alleles at linked loci 


Equilibrium 

AB 


Ab 


aB 

Ab 


F(A) = 60% 

F(a) = 40% 

F(B) = 70% 

F(b) = 30% 

Haplotypes: 


AB 


Disequilibrium 





AB at 


AB 

aB 


Ab 

ab AB 


AB 

AB 

Ab 

AB 

ab ab 

’ AB 


aB 

aB 


AB 

-1- 


AB 

-1- 

A 

-1- 

-1- 

B 

-1- 

42% 

A 

B 

60% 

A 

b 

i 

18% 

-1 — 

a 

-1 — 

b 

30% 
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i 

B 
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B 

10% 
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b 
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Over time, more crossovers will occur 
between loci located further apart 



B and C will be found together on the same haplotype more 
often than A and B: there is more linkage disequilibrium 
between B and C than A and B 


Factors that May Affect Linkage 
Disequilibrium Patterns 

• Chromosome location 

- Telomeric vs. centromeric 

- Intragenic vs. extragenic 

• DNA sequence patterns (GC content; presence of Alu 
elements) 

• Recombination hotspots (1 every 50-100 kb) 

- 13-mer bound by PRDM9 associated with 40% of hotspots 

• Evolutionary factors: LD varies among populations 

- Natural selection 

- Gene flow 

- Mutation, gene conversion 

- Genetic drift 

- Time elapsed since founding of population 
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Linkage disequilibrium (LD) decays with physical 
distance more quickly in “older” populations 



SNPs in disequilibrium are redundant: we 
don’t need to type all of them 


TagjSNP 


Person A 

ATTG 

Person B 

ATTG 

Person C 

ATTG 

Person D 

ATTG 

Person E 

ATTG 


ATT G AT©3 G AT.. . C C <T)C G G A . . . C <A)A 


ATTG AT AG G AT. ..CCAGCGGA...CTCA 


ATT G AT03 G AT.. . C C <T)C G G A .. . C X& 
ATTGAT AGGAT...CCAGCGGA...CTCA 
AT T G AT©G G AT_ C C <?)C G G A_ C <A> 


For genome-wide association studies, “complete” coverage is given by about 1.6 million 
SNPs for African populations, 600,000 to 1M SNPs for non-African populations 
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Published Genome-Wide Associations through 12/2013 
Published GWA at p<5X10 -8 for 17 trait categories 








• Digestive system disease 

• Cardiovascular disease 

• Metabolic disease 

@ Immune system disease 
O Nervous system disease 

• Liver enzyme measurement 

9 Lipid or lipoprotein measurement 
O Inflammatory marker measurement 
O Hematological measurement 
O Body measurement 
© Cardiovascular measurment 

• Other measurement 
O Response to drug 

© Biological process 

• Cancer 


NHGRI GWA Catalog 
www.genome.gov/GWAStudies 
www.ebi.ac.uk/fgpt/gwas/ embl-ebi r 


Recombination hotspots 

• LD patterns indicate 25,000 - 50,000 
hotspots in human genome (1 every 50 - 
100 kb) (Myers et al., 2005, Science) 

• 60% of all recombination occurs in 6% of 
genome) (Coop et al., 2008, Science 319: 
1395-8) 

• Hotspots are not congruent in human and 
chimpanzee and vary among human 
populations 
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Examples of genes in which elevated LD 

indicates recent positive selection 

Gene 

Phenotype 

G6PD 

Malaria protection 

CYP3A5 

Sodium retention 

LCT (lactase enhancer) 

Lactase persistence 

SLC24A5 

Skin pigmentation 

EPAS1, EGLN1 

High-altitude hypoxia 
response 

Voight et al., 2006, PLOS Biology, Simonson et al., 2010, Science; Grossman et al., 2013, Cell 
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Tibetans have regions of elevated LD and 
extended homozygosity in HIF-pathway and 
0 2 sensing genes 


EPAS1 


Horizontally arranged 


collection of Tibetan - 


chromosome segments 


Horizontally arranged 

iitonii 

collection of Han Chinese - 


chromosome segments 

. iililiil 


EGLN1 HMOX2/NMRAL1 



Yellow = ancestral allele 
Red = derived (selected) allele 


Simonson et al., 2010, Science 
Simonson et al., 2015, Exp. Physiol. 


EGLNl (PHD2) and PPARA haplotypes under positive 
selection are associated with reduced hemoglobin 



EGLNl putatively advantageous haplotypes PPARA putatuvely advantageous haplotypes 



Simonson et al., 2010, Science 
Lorenzo et al., 2014, Nat. Genet. 
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Erythroid progenitor cells produce the Tibetan 
phenotype under hypoxia 


Wild type 

under 

normoxia 


PHD2 D4E ' C127S 
mutants under 
normoxia 


A 

% 

« 

♦* 

C 

D 

f * « 


Wild type 
under hypoxia 
(5% 02) 


PHD2 D4E ' C127S 

mutants under 
hypoxia 


PHD2 D4E ' C127S produces a gain of function under hypoxic conditions, reducing hemoglobin 
concentration and providing protection from polycythemia . 


Composite of Multiple Signals (CMS) test for recent positive selection 


EGLN1 EPAS1 



1 -1-1-1-1-1-1-1-1-1-1-1-1-1-1—I I I I I II 

1 - CM CO ^ IfitDr'-aOOTO-i-CVICO'St- LO CD h- 00 0 ) 0i-c\i 

1- -I- 1- 1- 1- 1- -I- -r- -r- t-CMC\K\I 


Hu et al., Genome Research (under review) 
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Population genetics is guiding development 
of new sequence analysis resources 

• 1000 Genomes Project 

- Provides “control sequences” for variant analysis 

- Most rare variants are population-specific 

• When is a variant functionally significant? 

- Functional regions show more purifying selection 

(VAAST software: M. Yandell etal., 2011, Genome Res.; pVAAST: Hu etal., 2014 
Nature Biotech.) 

- Evolutionary conservation among species; 
especially useful for noncoding DNA 


Population genetics and genome analysis 

• Genetic variation contains useful information 
about population history 

• Genetic variation provides a more informed view 
of “race” and its relevance to medicine 

• Population genetic analysis has been critical in 
understanding linkage disequilibrium and its 
application in disease-gene mapping 

• Population genetics becomes even more critical 
in understanding role of rare variants in disease 

• Population genetics is fun\ 
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